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ABSTRACT 


Pfizer researchers reported in 2018 that lipophilic efficiency (LipE) is an 
important metric that is increasingly being applied to medicinal chemistry 
drug discovery programs. In this perspective, drug discovery examples have 
been strictly applied when adopting LipE to guide medicinal chemistry lead 
optimization toward candidate drugs with exceptional efficacy and safety in 
vivo potential, especially when guided by optimization based on 
physicochemical properties. In general, most medicinal chemists only consider 
potency and try to increase it during hits and lead optimization or when 
studying the structure-activity relationship. It should be noted that 
lipophilicity should be considered in conjunction with potency variations to 
ensure both the safety (drug-likeness) and the efficacy of the candidate drug. 
Therefore, the aim of this study is to identify successful potential leads against 
3CL-pro and optimize them for maximum potency and safety in COVID-19 
treatment with a design strategy approach. 3CL-pro inhibitors with lipophilic 
efficacy and related bioactivity data were obtained from the Chemb] database 
and analyzed based on relationship between LipE and logP (lipophilic). The 
2D physicochemical 
Quantitative Structural-Activity Relationships (QSAR) model was built and 


bioactivities of novel compounds were predicted while molecular mechanism 


descriptors of the compounds were calculated. 


was inferred by docking assay. Based on analysis, 80 novel compounds were 
found, 6 of the novel compounds (36, 37, 46, 47, 77 and 79) revealed an 
increase in both LipE and potency with logP decrease, which makes them 
better alternatives to existing 3CL-pro inhibitors in the treatment of COVID- 
19. 
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1. INTRODUCTION 


From a global public health and socioeconomic perspective, there is an urgent expectation to come-up with a rapid intervention that 
includes effective vaccines and antiviral drugs to stop the spread of the current pandemic virus [1] designated as coronavirus 
disease 2019 (COVID-19), broadcasted by the World Health Organization a pandemic [2, 3]. The infection inhibits the liver, 
respiratory and central nervous systems, the digestive system of humans and animals. Thus the resent name, SARS-CoV-2 agreed 
upon by scientist due to 82% similarity with the known Severe Acute Respiratory Syndrome-Coronavirus (SARS-CoV) [4-6]. The 
SARS-CoV-2 principle protease (Mpro), also identified as chymotrypsin-like (CL) protease (3CLpro), is a non-structural protein 
playing crucial role through cleavage of the virus-encoded polyproteins for replication and maturation (Coro 21). By so doing, it is 
rendered as an alluring quarry for the upshot of efficacious antiviral drugs against SARS-CoV-2 virus [7, 8]. 

The 3CLpro homodimer enzyme (EC: 3.4.22.69; optimal activity at pH 7.5 and 42 ° ©) is largely maintained amid members of 
Coronaviridae with approximately 40-44% sequence homology. It has three structural segments consisting of domain I (residue 8 - 
101) and domain II (residue 102 - 184), with the pair having beta-barrel motifs representing the catalytic region of chymotrypsin, as 
well as domain III (residue 185 - 200), having helical structure participating in the protein dimerization and enzyme activation [9- 
13]. A valid approach necessary to design potent drugs for SARS-CoV-2 infection requires the interaction mechanism of protein- 
ligand. These protein-ligands exhibit an important role in structure-based drug design, while enhancing steric compatibility of drug 
agents is an effective strategy for generating energy-efficient binding of drug agents to target receptors [14]. 

Leeson and Springthrope [16] initially proposed ligand lipophilicity efficiency (LLE or LipE) [15]. And various studies 
conducted on this ligand have proofed it to be an effective and direct means of evaluating the quality of research compounds. 
Safety of the compounds is accessed by relating lipophilicity and potency. LipE tries to maximize acceptable minimum lipophilicity 
per unit in vitro or in basic terms, to enhance the activity while sustaining low lipophilicity [16]. LipE is estimated as logP (or log D) 
minus logarithm of ligand potency (pKd, pKi or pEC50) [17]. Practically, instead of measuring log P, the computed value of clog P 
is commonly adopted together with the most pertinent in vitro strength to predict in vivo efficacy [18]. LipE can be adopted to 
recognize small sized molecules which would otherwise be overlooked due to its low potency and lipophilic value. This is 
advantageous in view of the fact that size and lipophilicity are largely increased following lead optimization. Thus using small 
quantity of LipE, makes less lipophilitic compounds a beneficial starting point [17, 19, 20]. 

In recent time, LipE has gained popularity as a direct and important method to modulate lipophilicity, with examples to show 
for its use in drug optimization. Some notable examples worth mentioning include; CB2 agonists [24], CB2 agonists/CB1 agonists 
[25], soluble epoxide hydrolase inhibitors [35], twofold PI3K/mTOR inhibitors [27], HIV non-nucleoside reverse transcriptase 
inhibitors [28], ATP-competitive Akt inhibitors [26], design of a potent cyclin-dependent kinase 2 (CDK2) inhibitor [21, 22] and 
protein kinase B inhibitor [23] (see [29, 30] for review). Hence this study aims to select successful leads as starting points towards 
optimization to design more potent and drug-like clinical candidates as inhibitors of 3CL-pro. 


2. MATERIALS AND METHODS 


2.1. Collection of data and dataset groundwork 

The trivial name, source, lipophilic efficiency metrics (LipE), experimental lipophilicity (AlogP) and biological activities (ICs0) of 
3CLpro inhibitors were obtained from the ChEMBL database. A total of 15 coronavirus 3C-like proteinase Inhibitors were retrieved 
on the basis of avail chemical structures with corresponding bioactivities (ICs0) (Figure 1). Bioactivities (IC50) were subsequently 
adjusted to plCs0 adopting the expression pICs0 = (-log (ICs0 X)). 3CLpro inhibitors chemical structures were downloaded from the 
ChEMBL database as smiles which were then changed to (2D) SDF format on DataWarrior v5.0.0. [31]. 


2.2. Analysis of Lipophilic efficiency 


The lipophilic efficiency of retrieved dataset in relation to their corresponding potency (pICs0) and lipophilicity (logP) was carried 
out using DataWarrior v5.0.0 
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2.3. 2D QSAR study 

The potency of a compound must be quantified by molecular descriptors in order to build a QSAR model [32] and so based on this, 
CDK descriptor version 1.0 was adopted for the computation of varied descriptors in the following groups: Hybrid features, 
Constitutional properties, Topological properties, Electronic properties and Geometric descriptors. The computed properties were 
organized in a matrix format. These computed properties were preprocessed to determine the correlation coefficient cut-off of 0.99 
based on a variance cut-off of 0.0001 and take-away invariables (constant column) by using JFrameVWSP version 1.0. The dataset of 
28 molecules retrieved from literature [33] was separated into the test and training dataset by adopting Kennard-Stone method [34] 
QSAR model validation is essential to assess how reliable a developed model is [35]. This is usually achieved by evaluating the 
internal stability and predictive ability of the QSAR models. In this study, the QSAR model developed was authenticated using 
leave-one-out (LOO) method to achieve internal validation. This was performed by removing a molecule, creating and 
authenticating the model against the individual molecule for all the Q? (rCV) values and documented. Equation (1) was used to 
calculate rCV? (cross-validation regression coefficient) to describe the internal stability of the model. 


yi. ~ Krad y 


rCV? =1- — (1) 


Where Y in the stated equation represents the training dataset average activity value. Yprea and Yobs stands for the predicted and 
observed activity values correspondingly. It is worthy of note that, rCV greater than 0.5 proposes a realistically robust model [36]. 
Sequel to the process of internal validation, QSAR model’s high predictive power was projected from an external test set of 
compounds not applied in building the QSAR model. The predictive capacity or external validation obtained was determined by 
predictive R?(Rprea*) based on equation (2). 


= 
>> (aa - DM aesein 


Ytest) and Ypred(test) is the observed activity and predicted values for test set compounds respectively. 


(2) 


Y training) is the average bioactivity of compound in the training set. 
QSAR model (Rprea*) greater than 0.6 is the acceptable predictive power for the test set molecules [37-39]. 


MLR method was used to develop QSAR model from the dataset to exam potential leads against 3CLpro inside a training 
dataset (28 compounds). CDK algorithm was used to calculate the total molecular descriptor (108) for individual compound. 


2.4. Molecular docking 

The three-dimension crystal structure of (PDB: lujl) from the protein databank was retrieved to prepare the target SARS 
coronavirus 3C-like proteinase (3CLpro) [40]. Discovery studio 2017R2 was employed to remove all heteroatoms while Pymol tool 
for non-essential water molecules. Subsequent to receptor and ligands preparation, this study utilized PyRx, AutoDock Vina option 
based on scoring functions to perform molecular docking analysis. PyRx, AutoDock Vina exhaustive search docking function was 
used for the analysis. Upon successful minimization process, the resolution of the grid box was centered at 76.0065 x-11.5107 x 
18.0445 on the x, y and z axes correspondingly at grid dimension 25x 25 x 25 Ato specify the binding site (Figure 14b). 


ISSN 2278-540X EISSN 2278-5396 | OPEN ACCESS 


Pagel 33 


DRUG DISCOVERY | RESEARCH ARTICLE 


14 15 3 
ei a ¥ \ Re 
H ' 4 i> Tay tO 
otk ell so a 


LipE:-0.22 LipE: -0.67 LipE:-2.09 LipE:0.56 LipE:2.29 
piC50: 4.22 plC50:4 plC50: 4.96 plC50: 4.22 plC50: 4.4 
AlogP: 4.44 AlogP: 4.67 AlogP: 7.05 AlogP: 3.66 AlogP: 2.11 

1 2 12 5 7 
af 
Q a i. Y ss 4 : 
j= } a # ° 
IO Gat HAO! . 3 

LipE:-0.78 LipE:-1.01 LipE:0.75 LipE:-0.66 LipE:-0.57 LipE:-2.04 
piC50: 5.52 piC50: 5 piC50: 4.35 piCS0: 4.4 plC50: 4.85 piC50: 4.82 
AlogP: 6.3 AlogP: 6.01 AlogP: 3.6 AlogP: 5.06 AlogP: 5.42 AlogP: 6.86 

8 6 4 
ve 
) 
cy nee LO ll - ry O 
Pg A Pte, e 
AJ oo 

LipE:-1.02 LipE:-0.36 LipE:-2.13 
piC50: 4.82 piC50: 4.82 piC50: 4.92 
AlogP: 5.84 AlogP: 5.18 AlogP: 7.05 


Figure 1: Raw data with corresponding bioactivities (ICs50) 


3. RESULTS AND DISCUSSION 
Figure 2 shows the relationship between potency (pICs0), lipophilicity (logP) and lipophilic efficiency (LipE) of 3CL-pro inhibitors, 
which is useful in lead selection and optimization as discussed in the section below: 


3.1. Iso-LipE 3CLpro inhibitors 

Multiple combinations of pICs0 and logP can lead to the same LipE. On this account, having the knowledge of LipE alone for 
molecule is not informative [50]. It is necessary to interpret 3CLpro inhibitors in the LipE plot format, to give insight into required 
procedure for optimization-oriented designs. Based on our analysis, compound 2 had a pICs0 value of 5 with a measured logP of 
6.01, resulting in a LipE of -1.01 (Figure 3). Although compound 8 is slightly less potent with a pICs0 value of 4.82, it had a logP of 
5.84 which resulted in a comparable lipophilic efficiency of -1.02. 

If we consider only the potency, compound 2 ought to appear superior to compound 8. These two compounds are comparably 
attractive and isoefficient for follow-up. In addition, taking into consideration that both compounds 2 and 8 have logP values of 6.01 
and 5.84 respectively, which exceed the required optimum for ADME (Absorption, Distribution, Metabolism, and Excretion) 
properties (logP: < 5), none is considered a better starting place than the other. Also, compound 11 had a pICsovalue of 4.4 with a 
measured logP of 5.06, resulting in a lipophilic efficiency of -0.66 while compound 15, with similar potency of pICso valueof 4, had a 
logP of 4.67 resulting in a similar lipophilic efficiency of -0.67 (Figure 3). Given that compound 11 had a logP of 5.06 which exceed 
the required optimum for ADME properties (logP: < 5), compound 15 is a better starting place. In fact, both compounds are 
isopotent i.e. they have similar pICs0 values. This highlights the essence of analyzing LipE changes in the context of logP for 


selecting a lead compounds [41]. 
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Figure 2: Relationship between pICs0 and logP of 3CL-pro inhibitors in relation to their respective LipE 
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Figure 3: Isoefficient 3CL-pro inhibitors 


3.2. Iso-potent Efficiency Changes of 3CLpro inhibitors 


Unlike compound 2 and 8; 11 and 15, the compounds with comparable pICs0 can have different LipE. Not realizing logP would 
make then look alike but then again, the compounds with lower logP will possess a higher lipophilic efficiency [42]. This is an 
isopotent efficiency change. It is illustrated by the potency of compounds 6 and 8; 13 and 14, in which reduction in LipE are seen 
while pICs0 is sustained, which lead to an increase in LipE (Figure 4).Compounds 6 and 8; 13 and 14 are same in terms of pICs0, 
irrespective of logP ranging from 3 to 6. Of all these compounds, 13 have appreciably high LipE when compared with the other 
compounds, thus presenting it a valuable lead which would have been lost in the subset of equipotent compounds without a LipE- 


based analysis. 
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Figure 4: Isopotent 3CL-pro inhibitors 


3.3. Isolipophilic LipE Changes of 3CL-pro inhibitors 

Invariant logP in relation with pICso changes throughout a set of derivatives resulting in isolipophilic modifications of LipE. 
Generally, isolipophilic variation in efficacy occurs with twosome of enantiomers, even though they are not limited to such 
examples [42]. This is observed in compound 12 and 13 in which an increase in LipE are observed while lipophilicity is maintained. 
This resulted in an insignificant increase in potency (Figure 5). Compound 12 and 13 are same in terms of their logP values, despite 
plCs0 values of 4.35 and 4.22 respectively. 12 possessed slightly higher LipE, making it an indispensable lead. Because logP remain 
the same, assessment of isolipophilic LipE variations cannot be distinguished from a potency-centric analysis. 
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Figure 5: Isolipophilic 3CL-pro Inhibitors 


At the same time, decreasing logP and improving potency results in a very large increase in lipophilic efficiency [42-44]. This 
was achieved by generating novel compounds based on structural modification of the starting chemical entity. Therefore, in a 
search for a starting chemical specie from our dataset, Compound 1 (with the lowest logP value) and 10 (with the highest potency) 
reveals an efficiency increase (3-fold) of -0.78 and 2.29 respectively with potency and logP decrease (figure 6). Given that compound 
1 had a logP of 6.03 which is over what is optimal for ADME properties (logp: < 5), compound 10 with logP value of 2.11 is a better 
starting chemical entity. 
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Figure 6: 10-1 showing an efficiency Increase decrease in potency and logP 


Quantitative structural-activity relationship analysis (QSAR) is a successful method that has been adopted in reasonable drug 
design and in understanding drug action mechanism. Bioactivities of compounds are rated as a function of various physicochemical 
properties in QSAR studies. This makes clear how the variation of biological activity is based on alteration in the chemical 
structures [45]. These physicochemical properties are quantified in the form of descriptors. Given this, we designed a set of new 
compounds from compound 10 with a de novo design approach using DataWarrior v5.0.0. DataWarrior adopted an evolutionary 
method that mimics nature by randomly transforming known molecular configurations having small changes to establish novel 
generations with possibly better structures. Each generation of molecules are tested for robustness with a set of modifiable 
principles and the most auspicious structures serve as a starting point for subsequent generation. The mutation algorithm executes 
vicissitudes such as bond order changes, ring aromatisations, replacing an atom, atom insertions, substituent migrations just to 


mention a few. 


Fitness 


Generation 


Fitness 0.955 0.965 0.975 0.985 0.995 


Figure 7: Novel compounds generated based on the structure of the parent molecule (compound 10) (a = Compound 10; b = 


Compound 17) 


Every structure to be mutated is firstly evaluated for all possible mutations concerning how extensively the alteration will 
increase or decrease the drug-likeness. Mutations with alteration in the required direction are assigned a higher probability than 
mutations that reduce drug-likeness. Mutations that would create high ring tension are eliminated from the list. Based on this 
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approach, 81 structures were created that retain the scaffold of compound 10 with the corresponding fitness scores (Figure 7). 
Compound 17 of the second generation holds the highest fitness scores (0.98817), which makes it closest to the original structure 
(compound 10, fitness score = 1,000). 

Tsai et al., (2006) [33] discovered 28 novel family of SARS-CoV Protease Inhibitors with their corresponding ICso values and 
shared the same scaffold as compound 10 (Figure 8). In order to predict the bioactivities of the newly generated datasets, a QSAR 
model was built from the 28 datasets having the same scaffold as the novel compounds. 


17 15 


<< "ot 
See a ree so RO on, Lorse “oO 


Lo 3.69897 ICSO: 3.60205 plC50:3.30102 <a 3.45593 ak 4.22184 plC50:4 
3 22 16 9 13 10 


rs ¢ re © 6 ‘ = pyro. ye 
’ ed . ne 


pIC50:4.95860 pIC50:3.52287. pIC50:3.69897'  pIC50:4.52287. pIC50:4.22184 pIC50:4.39794 
1 23 18 24 28 2 
e, -" OC oe ype " 4 
a o #* . ae = i A. " n= ~ 
yOu! ot S|, cA l6 sor ee 
= a AJ t ° 1 . 
pIC50:5.52287. piC50:3.52287. pIC50:3.69897' pIC50:3.52287. pIC50:3 pIC50:5 
12 11 20 5 7 8 
" . rt on @) 
» ~ A Z 4 
” mr H ae = os a Say 2 Oo oe Bs 
? 9 te || NAYS J Pa ae 
Se | Lor Ary a 2 or 


pIC50:4.34678 plC50:4.39794 plC50:3.69897' plC50:4.85387 Te gona pIC50: 4.82390 


6 4 26 19 
O 
. ~~ ~ a, ee bel F ” a - 
ane! aman fous poe w i> a cry ol yy 


plC50:4.82390 plC50:4.92081. plC50:3.39794 plC50:3.69897) 


Figure 8: Retrieved chemical structures with same scaffold as compound 10 


To analyze the multiplicity of the testing and training set, the Principal Component Analysis approach (PCA) was adopted and 
the PCA was executed with structural descriptors evaluated for the entire data set. This approach helped to identify homogeneities 
in the total data, as well as to describe the spatial location of the samples to help in dividing the data into train and test sets. 

The PCA result revealed three main components (PC1, PC2 and PC3), elucidated as 99.998% of the entire variables which are as 
follows: PC1 = 41.975%, PC2 = 33.541% and PC3 = 24.482%. Given that the first three principal components can account for most of 
the variables, the different score plot is a dependable exemplification of the spatial allotment of points for the whole data. The 
compounds distribution over the initial three principal components space is shown by the plot of PC1, PC2 and PC3 (Figure 9) with 
PC1 and PC2 covering the largest variability in the total data (Figure 10). This number revealed that the samples of the test and 
training sets appear to be uniformly strewed in the three-dimensional space and consequently, it was possible to divide the dataset. 
In addition, the compounds in the training sets are represented in the whole dataset. 
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Figure 9: Analysis of the primary component of the test and train sets 
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The next step after analyzing the separation of the dataset into the test and training set was to identify and choose the major 
factors that are most important for the SARS-CoV protease inhibition activity of the 28 new inhibitors. Genetic algorithm (GA) was 
applied as the method for selecting variables to select only the most important (relevant) combinations to obtain the model with the 
utmost predictive power by employing training dataset. The three (3) most important descriptors according to the GA approach, 
are SCH-5, C1SP3 and khs.ddsN based on the variance cut-off of 0.01 and inter correlation cut-off of 0.9. In this study, these 
descriptors have been shown to be related to the studied biological response and are described in Table 1. 

In general, the quality of a model is described by its predictability in a QSAR study. The techniques adopted in this study were 
performed to correlate physicochemical descriptors of 28 3CL-pro inhibitors from their inhibitory training set. Physicochemical 
descriptors were taken as individual variables and the inhibitory activity of the 3CL-pro target was taken as dependent variables. 
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The data set of 28 compounds was divided into training set of 19 compounds in order to create and test the model. The datasets 


were subsequently utilized to construct the model and a test set of 9 compounds, which were used to test the structured model. The 


resulting linear equation based on the MLR is as follows (equation 3): 


pIC,, = 4.52085 (+0.2233) —0.2688(+0.08067) C1SP3 


+ 4.92257 (+0.79286) SCH —5 —0.51868 (+0. 18816) khs.ddsN 


R? (regression coefficient) = 0.82776, 

SEE (standard error of estimate) = 0.28645, 
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SDEP (standard deviation of error of prediction) = 0.3829 

r? (LOO) ((Leave out one cross validation) = 0.7077 

PRESS (predicted residual sum of squares) = 1.23081 

F (variance ratio) = 24.02849 (DF: 3, 15), r?prea (external predictive power) = 0.80913 
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Figure 11: Experimental and Predicted bioactivities of the Train and Test set 
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Equation (5) suggests that the model established with GA-MLR presented an excellent tetragonal correlation coefficient (R*) 
value with both internal and external predictive power (r*pred) having excellent values. The developed QSAR model derived by 
GA-MLR showed a noteworthy connection between dependent variable (pICso values) and the carefully chosen descriptors 
(independent variables). 

An 82.8% correlation exists among the activity and selected descriptors in the training dataset, determined based on value of the 
regression (R*=0.82776). Meanwhile, the value of the cross-validation regression coefficient (Q? = 0.6468) put forward ~64.7% 
prediction exactitude of this QSAR model. 

Figure 11 shows the predicted and observed biological activities of the training and test datasets. Figure 12 is the predicted pICs0 


plot versus the experimental pICs0 which revealed that there is a good agreement between the predicted and experimental activity 
values. 


Table 1: GA selected descriptors with their corresponding description 
S/N Descriptors Description Contribution 


Singly bound carbon bound to one other 


il €1Sk3 Negative Contribution 


carbon 
2 SCH-5 Simple chain, order 5 Positive Contribution 
3 khs.ddsN Description not available in the database Negative Contribution 


Predicted pIC50 


r=0.884 (Bravais-Pearson) 
4 4.5 
Observed pIC50 


Dataset @Testset MH! Trainset 
Figure 12: The predicted pICs0 values using MLR modeling against the experimental (observed) pICs0 values 
The three descriptors for the best model (GA-MLR) used for PC1—-PC2 loadings plot is shown in Figure 13. For the loadings, it 
was affirmed that the compounds with greater biological activity values, situated on the right side presented a larger influence on 


the SCH-5 descriptor located on the same side as in Figure 8. Conversely, compounds with lesser biological activity values on the 
left side have extra distinct contributions from the other descriptors (mostly from khs.ddsN and C1SP3). 
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SCH-5 


khs.ddsN 


PC2 (33.541%) 


-0.6 -04 -0.2 0 0.2 04 0.6 
PC1 (41.975%) 


Figure 13: PC1—-PC2 loadings plot using the three descriptors for the best model (GA-MLR) 


The Y randomization test performed to guarantee that there are no random correlations was used in this study to validate the 
recognized QSAR model and also to ensure that the selected descriptors are not random. Therefore, results of the model should be 
of low statistical quality. Random MLR models created was done randomly by rearranging the dependent variable while 
maintaining the individual variables. The recently built QSAR models will predictably have significant R* and Q? low values for 
more than a few trials, thus confirming that developed QSAR models are robust. Approximately hundred trials of Y-randomization 
were conducted in this study which gave lesser values for R? and Q?, thereby authenticating the initial model (the established GA- 
MLR model) (Figure 14). 


RA2 and Q’2 (LOO) 


Iteration 
Parameter 
@ Model Q*2(LOO) @ Model R*2 
@ Q*%2(LOO) @RA2 


Figure 14: R* train and Q*LOO values following numerous Y-randomization tests for GA-MLR 
The residue for the predicted values of pICso for the training and test sets against the experimental pICsovalues is plotted as 


presented in Figure 15. It was observed that the model did not indicate relative and systematic error at all, since the propagation of 


the residues on the horizontal lines is random. 
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5:5 


4.5 


Experimental pIC50 


0.1 02 O03 04 05 
Residual 
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Figure 15: The residual against the experimental pICs0 by adopting GA-MLR 


Furthermore, QSAR model was utilized to envisage the bioactivities of the novel compounds from their respective 2D 
physicochemical properties (SCH-5, khs.ddsN and C1SP3). The compound in generation 0 represents the parent molecule 
(compound 10), and has a predicted pICs0 value of 4.49. Twenty compounds across generation 3-10 have a predicted pICs0 values 
higher than the parent molecule, which makes them more potent than the parent molecule (Figure 16). From the predicted pICs0 we 


computed LipE and clogP for each of the novel compounds using Datawarrior v5.0. 
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Figure 16: De novo synthesized compounds with higher predicted bioactivities 
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Increase in LipE can be attained by modifying the ligand to impact lipophilicity, potency or both. The biggest assured impact of 
LipE frequently occurs when adjustments improve potency and concurrently lower lipophilicity [33]. Based on our design strategy, 
6 of the novel compounds (36, 37, 46, 47, 77 and 79) revealed an increase in both LipE and potency with logP decrease (Figure 17). 


hap 
io 


Predicted pIC50 
& ES 
Pl (es) 


46 


4. 
: 0.6 0.8 1 1.2 14 1.6 18 


cLogP 


I 
cLipE 64 66 68 7 72 74 76 78 
Generation @0 H5 A6 10 


Figure 17: Optimized compounds based on bioactivities, logP and LipE 


In order to have a better understanding of the molecular mechanism underlying the action of the unique compounds selected on 
the basis of their bioactivities and LipE, molecular docking was employed in this study. In the present study, the six selected 
compounds and their parent molecule were docked into the binding pocket of SARS coronavirus 3C-like proteinase for their 
inhibitory (antagonistic) (Figure 18a and b) properties. Compound 77 showed a better binding affinity, -7.4kcal/mol, when 
compared with the parent molecule (compound 10; -7.3kcal/mol) and thus is considered as the lead compound (Figure 19). 


Figure 18: (a) Grid box within which the ligand binds 76.0065 * -11.5107 * 18.0445 along the x, y and z axes correspondingly (b) 
Compounds within binding pockets 


The highest binding energy of -7.4kcal/mol attributed to compound 77 is considered to be as a result of chemical interactions at 
the receptor’s active site (Figure 20a) which includes: Five (5) hydrogen bonds involving GLN110, ASN151 and HIS246 residues; 
Three (3) hydrophobic interactions involving VAL104, LEU202 and ILE249 residues. However, that of compound 10 serving as the 
reference molecule presents the following chemical interactions at the binding pocket (Figure 20b): Four (4) hydrogen bonds 
involving ASN151, ASP295 and THR111 residues with two (2) hydrophobic interactions involving ILE249 and PRO293. Therefore, 


ISSN 2278-540X EISSN 2278-5396 | OPEN ACCESS 


Page 14-4 


DRUG DISCOVERY | RESEARCH ARTICLE 


the consequence of higher binding affinity of compound 77 within SARS coronavirus 3C-like proteinase drug-able pocket is due to 


the presence of more hydrophobic interactions and more hydrogen bonds when compared to compound 10. 


7.05 46 

| = 
715 1 36 79 

-7.2 


Parent molecule 


Binding Affinity (kcal/mol) 


Generation 


piIcso 4.5 4.6 47 48 49 5 
Generation @0 5 A6 10 


clipE 064 @68 @72 @7s 
Figure 19: Binding affinities of selected 3CL-pro inhibitors 


Hydrogen (H)-bonds potentiates varied cellular functions by accelerating molecular interactions. In order words, hydrogen 
bonds are considered to be facilitators of protein-ligand binding [48, 49]. Previous studies have shown that interdependent receptor- 
ligand H-bond pairings potentiate high-affinity binding, which corresponds to an increase in binding affinity [50]. Additionally, 
ASN151 was predicted to be universally implicated in hydrogen bonding with the ligands within SARS coronavirus 3C-like 
proteinase drug-able pocket. 


| Comerteea Hy rages ford Oo Pra 
| Corben mpcrager Sore Bo Puy 
O Podonar Hytrogen Bard oO Pied 


Figure 20: 3D and 2D interactions (a) compound 77 (b) compound 10 


4. CONCLUSION 


This study has maximized the combination of LipE and logP to select potent leads against SARS coronavirus 3C-like proteinase 
Inhibitors and to optimize these leads based on changes in their physicochemical properties in order to discover highly efficient and 
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reliable clinical candidates for the cure of COVID-19. Six novel compounds (36, 37, 46, 47, 77 and 79) were suggested for further in 


vitro and preclinical testing based on their predicted efficiencies values and potency. 
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